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ABSTRACT 

Current  large-scale  topology  mapping  systems  require  multi¬ 
ple  days  to  characterize  the  Internet  due  to  the  large  amount 
of  probing  traffic  they  incur.  The  accuracy  of  maps  from 
existing  systems  is  unknown,  yet  empirical  evidence  sug¬ 
gests  that  additional  fine-grained  probing  exposes  hidden 
links  and  temporal  dynamics.  Through  longitudinal  anal¬ 
ysis  of  data  from  the  Archipelago  and  iPlane  systems,  in 
conjunction  with  our  own  active  probing,  we  examine  how 
to  shorten  Internet  topology  mapping  cycle  time.  In  par¬ 
ticular,  this  work  develops  discriminatory  primitives  that 
maximize  topological  fidelity  while  being  efficient. 

We  propose  and  evaluate  adaptive  probing  techniques  that 
leverage  external  knowledge  (e.g.,  common  subnetting  struc¬ 
tures)  and  data  from  prior  cycle(s)  to  guide  the  selection  of 
probed  destinations  and  the  assignment  of  destinations  to 
vantage  points.  Our  Interface  Set  Cover  (ISC)  algorithm 
generalizes  previous  dynamic  probing  work.  Crucially,  ISC 
runs  across  probing  cycles  to  minimize  probing  while  de¬ 
tecting  load  balancing  and  reacting  to  topological  changes. 
To  maximize  the  information  gain  of  each  trace,  our  Subnet 
Centric  Probing  technique  selects  destinations  more  likely 
to  expose  their  network’s  internal  structure.  Finally,  the 
Vantage  Point  Spreading  algorithm  uses  network  knowledge 
to  increase  path  diversity  to  destination  ingress  points. 

Categories  and  Subject  Descriptors 
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1.  INTRODUCTION 

The  scale  of  the  Internet  makes  obtaining  representative 
metrics  and  characteristics  challenging.  Compounding  this 
challenge,  the  Internet  is  poorly  instrumented,  lacks  mea¬ 
surement  and  management  mechanisms  [5],  and  providers 
hide  information.  Researchers  therefore  must  frequently  make 
inferences  over  limited  available  data,  and  may  form  false 
conclusions  [15]. 

Understanding  the  complex  structure  of  the  Internet  is  vi¬ 
tal  for  network  research  including  routing,  protocol  valida¬ 
tion,  developing  new  architectures,  etc.  More  importantly, 
building  robust  networks,  and  protecting  critical  infrastruc¬ 
ture,  depends  on  accurate  topology  mapping. 

While  dedicated  platforms  exist  to  perform  topology  mea¬ 
surements,  e.g.  [11,  19],  these  must  balance  induced  mea¬ 
surement  load  against  model  fidelity.  Unfortunately,  in  prac¬ 
tice,  such  balancing  results  in  multiple  days  worth  of  mea¬ 
surement  to  capture  even  an  incomplete  portion  of  the  Inter¬ 
net.  Employing  more  vantage  points  is  an  effective  technique 
to  improve  topological  recall  [23],  but  does  not  reduce  total 
load  or  cycle  time. 

This  work  proposes  primitives  toward  the  eventual  goal 
of  performing  high-frequency  active  Internet  topology  mea¬ 
surement.  Measurement  load  hinders  the  ability  to  capture 
small-scale  dynamics  and  transient  effects  that  occur  at  fre¬ 
quencies  higher  than  the  measurement  period;  effectively 
creating  Nyquist  aliasing  loss.  For  example,  recent  work 
[20]  shows  fewer  than  50%  of  Internet  paths  remain  station¬ 
ary  across  consecutive  days.  Our  own  analysis  of  set  cover 
techniques  [10]  finds  that  the  rate  of  missed  interfaces  in¬ 
creases  in  proportion  to  the  time  since  the  covering  set  was 
created:  implying  that  “train-then-test”  methodologies  are 
insufficient. 

Our  work  therefore  focuses  on  two  separable  problems  via 
a  unified  methodology,  how  to:  i)  select  destinations  in  the 
network  to  probe;  and  ii)  perform  the  probe.  We  examine  the 
hypothesis  that  by  leveraging  external  network  knowledge, 
e.g.  routing,  address  structure,  etc.,  and  adaptive  probing, 
the  active  traffic  load  can  be  significantly  reduced  without 
sacrificing  the  inferred  topology  fidelity.  Our  methodology 
extends  prior  schemes,  e.g.  [8]  which  attempt  to  reduce  mea¬ 
surement  overhead,  but  are  artificially  parametrized,  lossy, 
and  ignore  temporal  effects  across  measurement  periods.  To¬ 
ward  high-frequency  Internet  topology  mapping,  we: 

1.  Quantify  unnecessary  probing  performed  by  produc¬ 
tion  topology  measurement  platforms. 

2.  Develop  three  algorithms  that  use  network  knowledge 
to  intelligently  drive  adaptive  probing. 
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Levenshtein  Edit  Distance 

(a)  Unnecessary  probing:  >  60%  of  intra-BGP  traces 
have  ED  <  3;  fewer  than  50%  of  random  traces  have 
ED  <  10. 


Levenshtein  Edit  Distance  (last-hop  AS  removed) 

(b)  Contribution  of  last-hop  AS  to  path  variance:  ~ 
70%  of  probes  to  same  prefix  yield  no  information 
gain  beyond  the  leaf  AS. 


Figure  1:  Edit  distance  (ED)  distribution  of  Ark  (~  260fc)  and  iPlane  (~  150fc)  traces  to  different  addresses 
within  the  same  BGP  prefix  compared  to  baseline  ED  between  random  trace  pairs. 


2.  UNNECESSARY  PROBING 

Several  large  topology  measurement  experiments  have  been 
deployed,  including  CAIDA’s  Skitter/Archipelago  project 
(Ark)  [11,  13],  iPlane  [19],  and  DIMES  [22].  To  better  un¬ 
derstand  the  challenges  in  topology  mapping,  this  section  fo¬ 
cuses  on  the  existing  practice  of  Ark  and  iPlane  which  infer 
interface-level  topologies  via  traceroute-like  [1,  17]  probing. 

So  that  measurement  is  tractable,  production  systems  of¬ 
ten  follow  common  assumptions  over  the  Internet’s  struc¬ 
ture,  for  instance  by  probing  a  target  in  each  subnetwork  of 
size  28  (herein  referred  to  via  common  /24  prefix  notation). 
Ark  subdivides  all  routed  prefixes  (i.e.  visible  in  BGP)  into 
/24’s.  A  “cycle”  of  probing  is  a  complete  set  of  measure¬ 
ments  to  one  destination  address  within  each  routed  /24. 
The  probe  target  for  a  given  /24  is  randomly  selected  from 
the  28  possible  addresses. 

With  approximately  half  of  all  IP  addresses  globally  routable, 
a  cycle  consists  of  ~  231-8  /24s.  Due  to  this  large  number  of 
/24’s  to  be  probed  in  a  cycle,  approximately  9M,  Ark  divides 
the  probing  work  among  multiple  vantage  points  (measure¬ 
ment  sites).  Probing  at  a  /24  granularity  requires  significant 
time,  and  load.  With  asynchronous,  distributed  probing  to 
mitigate  per-path  RTT  variance,  a  full  cycle  requires  multi¬ 
ple  days  to  partially  characterize  the  Internet. 

Traces  can  be  distilled  into  an  interface-level  representa¬ 
tion  of  the  Internet  graph.  Some  traces  yield  more  informa¬ 
tion  than  others  based  on  the  choice  of  prior  probes.  For 
instance,  we  expect  traces  to  different  addresses  within  the 
same  BGP  prefix  to  be  similar,  while  probes  to  very  different 
destination  addresses  are  likely  to  have  a  higher  information 
gain. 

2.1  A  Path  Pair  Distance  Metric 

To  quantify  the  information  gain  of  intra-BGP  traces,  we 
use  the  Levenshtein,  or  edit,  distance  which  is  a  measure  of 
the  minimum  number  of  insert,  delete  or  modify  operations 
required  to  equate  two  strings. 

Let  the  alphabet  of  symbols  be  the  unsigned  32-bit  in¬ 
teger  space,  E  =  {0, . . .  ,  232  —  1}.  We  compute  the  edit 
distance  (ED)  between  trace  pairs  using  each  IP  address 
along  the  path.  An  ED  of  zero  implies  that  the  two  paths 


are  identical,  whereas  an  ED  of  one  implies  that  the  two 
traces  differ  by  a  single  interface  addition,  subtraction,  or 
replacement.  For  example,  for  the  following  two  interface 
paths,  ED(t\,t2)  =  2: 

ti  =  1.2. 6.1,  1.186.254.13,  2.245.179.52,  4.53.34.1 

l2  =  1. 2.6.1,  2.245.179.52,  4.69.15.1 

We  use  data  from  a  single  Ark  and  single  iPlane  monitor 
in  a  January,  2010  cycle  for  ED  analysis.  As  a  compara¬ 
tive  baseline,  we  also  compute  ED  over  an  equal  number  of 
random  trace  pairs. 

2.2  Quantifying  Unnecessary  Probing 

Figure  1(a)  shows  the  cumulative  fraction  of  path  pairs  in 
Ark  and  iPlane  as  a  function  of  ED.  The  ED  is  larger  for 
the  randomly  selected  traceroute  path  pairs  than  the  pairs 
from  within  the  same  BGP  prefix,  as  determined  by  a  con¬ 
temporaneous  Routeviews  [21]  BGP  routing  table.  Approx¬ 
imately  60%  of  traces  to  destination  in  the  same  BGP  prefix 
have  ED  <  3  while  fewer  than  50%  of  random  traces  have 
ED  <  10.  Thus,  as  we  intuitively  expect,  there  is  value  to 
using  the  BGP  structure  to  drive  the  probe  target  selection 
in  order  to  maximize  the  information  gain. 

Next,  we  wish  to  quantify  the  contribution  of  the  last  hop 
autonomous  system  (AS)  to  the  edit  distance  of  traces  to  the 
same  BGP  prefix,  i.e.  path  difference  attributable  to  subnet¬ 
ting  within  an  AS.  For  example,  Figure  2  depicts  the  sources 
of  path  diversity  observed  as  an  “hourglass”  with  multiple 
vantage  points  contributing  to  diversity  into  an  AS’s  ingress 
points,  and  the  degree  of  subnetting  within  the  destination 
AS  contributing  to  the  remaining  diversity.  The  “waist”  is 
the  set  of  ingress  points  for  a  prefix  which  may  be  common 
to  multiple  traces  or  require  distributed  vantage  points  in  or¬ 
der  to  be  discovered  (§4.3  discusses  the  diminishing  return 
of  additional  vantage  points). 

Figure  1(b)  is  the  result  of  an  ED  analysis  after  remov¬ 
ing  interface  hops  belonging  to  the  destination  AS,  as  de¬ 
termined  by  the  Routeviews  BGP  table.  We  observe  that 
for  ~  70%  of  the  probe  pairs  to  the  same  prefix,  there  is 
zero  additional  information  gain  beyond  the  leaf-AS.  There¬ 
fore,  from  this  off-line  analysis  of  traces  from  two  important 


Figure  2:  Topology  information  gain  hourglass:  path 
diversity  comes  via  multiple  vantage  points  and  via 
multiple  destinations  in  a  prefix.  The  hourglass 
“waist”  is  the  AS  ingress  point(s). 

topology  platforms,  we  conclude  that  there  exist  significant 
possible  packet  savings  by  intelligently  tuning,  e.g.  via  time- 
to-live  (TTL),  the  set  of  hops  each  trace  interrogates.  For 
instance,  a  basic  tracing  strategy  might  start  with  a  TTL 
suitable  to  reach  the  destination  and  iteratively  decrement 
the  TTL  until  a  previously  discovered  hop,  i.e.  at  the  AS 
ingress,  is  found. 

Moreover,  in  analyzing  pairs  of  traceroutes  to  the  same 
destination  prefix,  but  from  different  vantage  points,  we  find 
that  in  ~  30%  of  the  cases,  entirely  new  paths  are  discov¬ 
ered.  Only  approximately  10%  of  the  probes  from  a  new 
vantage  point  yield  less  than  four  previously  undiscovered 
interfaces.  Thus,  there  exists  significant  information  gain 
from  additional  vantage  points. 

These  potential  efficiencies  have  been  recognized,  most 
prominently  by  the  DoubleTree  method  [8,  7].  Unfortu¬ 
nately,  DoubleTree  relies  on  heuristics  to  tune  its  probing. 
In  §3  we  detail  non-parameterized  primitives  designed  to  ad¬ 
dress  the  low-gain  we  find  here  and  provide  efficiency  with¬ 
out  sacrificing  inference  power. 

Note  that  the  ED’s  for  iPlane  are  higher  than  for  Ark  due 
a  non-uniform  distribution  of  traces  to  prefixes  as  part  of  the 
iPlane  logic  [18].  Since  iPlane  provides  significantly  fewer 
instances  of  multiple  probes  to  the  same  prefix  as  compared 
with  Ark,  we  can  more  readily  test  our  primitives  against 
the  latter.  We  therefore  use  historic  Ark  data,  as  well  as  our 
own  active  probing,  for  the  remainder  of  this  paper. 

3.  ADAPTIVE  PROBING  METHODOLOGY 

This  section  presents  three  strategies  to  illustrate  the  po¬ 
tential  power  of  adaptive  probing  in  reducing  unnecessary 
probing:  1)  subnet  centric  probing-,  2)  interface  set  cover-, 
and  3)  vantage  point  spreading. 

3.1  Subnet  Centric  Probing 

A  naive  strategy  of  leveraging  BGP  knowledge  is  to  probe 
exactly  one  destination  within  each  advertised  prefix.  The 
potential  for  using  BGP  routing  information  was  first  recog¬ 
nized  by  Krishnamurty  and  Wang  [14] .  While  we  show  that 
such  an  approach  incurs  approximately  one-fifth  of  the  nor¬ 
mal  amount  of  probing  packets  sent  by  Ark,  it  is  too  aggres¬ 
sive  and  misses  significant  topology  information  of  networks 
with  a  rich  subnetting  structure. 

Intuitively,  we  expect  two  numerically  consecutive  IP  ad¬ 
dresses  to  be  more  likely  to  share  paths  (and,  hence,  have  a 
low  ED)  than  two  distant  addresses.  But  simply  employing 
address  distance  is  too  simplistic  and  does  not  capture  typ¬ 
ical  network  subnetting  structure  [4],  For  example,  the  two 
IP  addresses  18.255.255.254  and  19.1.1.1  have  a  numerical 


distance  of  2,  but  they  would  belong  to  different  networks 
unless  both  belonged  to  a  single  18.0.0.0/7  subnetwork. 

Instead,  we  propose  to  use  the  knowledge  of  how  net¬ 
works  are  subnetted  (the  preceding  example  illustrating  an 
example  where  subnetting  is  much  more  probable  than  no 
subnetting)  to  select  addresses  to  probe  within  each  BGP 
advertised  prefix.  The  motivation  is  to  adapt  the  number 
of  probes  to  the  degree  of  subnetting  within  the  prefix  to 
avoid  wasted  probing.  We  term  this  strategy  “subnet  cen¬ 
tric.”  The  current  Ark  strategy  assumes  a  fixed  subnetting 
boundary,  which  may  be  too  granular  (wasted  probing)  or 
too  coarse  (missing  information).  In  contrast,  we  ensure  that 
subsequent  destinations  in  a  prefix  are  as  distinct  as  possi¬ 
ble  in  their  most  significant  bits,  i.e.,  likely  part  of  distinct 
subnet  prefixes.  We  term  this  selection  of  destination  the 
least  common  prefix  principle.  For  example,  to  choose  four 
probes  for  prefix  192.168.0.0/16,  our  algorithm  initially  picks 
four  addresses  from  the  distinct  prefixes:  192.168.0.0/18, 
192.168.64.0/18,  192.168.128.0/18,  and 
192.168.192.0/18. 

As  probing  progresses,  we  use  our  pair-wise  ED  measure  to 
determine  whether  finer-grained  destinations  within  a  pre¬ 
fix  are  yielding  useful  additional  information.  The  prefix  is 
continually  probed  until  the  ED  value  of  the  paths  returned 
by  a  pair  falls  within  an  empirically  derived  pre-determined 
threshold  r  =  3. 

3.2  Interface  Set  Cover 

DoubleTree  [8]  explores  a  method  to  adapt  probing  in  real¬ 
time  as  a  measurement  cycle  progresses.  By  beginning  at  a 
heuristically  chosen  mid-point  and  working  both  back  to  the 
vantage  point  (decreasing  TTL)  and  toward  the  destination 
(increasing  TTL),  DoubleTree  achieves  packets  savings  by 
preempting  a  trace  when  a  previously  discovered  interface 
is  observed  -  the  inherent  assumption  is  that  subsequent 
probes  along  the  path  will  be  duplicates  of  previous  traces. 

While  DoubleTree’s  technique  partially  addresses  our  find¬ 
ings  in  §2,  it  must  determine  a  path’s  mid-point  and  does 
not  cope  well  with  load  balancing.  More  importantly,  it 
treats  each  cycle  independently  and  is  agnostic  to  informa¬ 
tion  learned  in  previous  probing  cycles.  Our  goal  is  to  lever¬ 
age  this  knowledge  to  reduce  the  number  of  trace  packets  in 
subsequent  cycles. 

We  hypothesize  a  greedy  Interface  Set  Cover  (ISC)  scheme 
that  always  selects  a  subset  of  probing  packets  based  on  the 
interface-level  topology  of  the  previous  cycle.  More  specif¬ 
ically,  the  interface-level  topology  includes  directed  edges 
where  the  direction  of  the  edge  records  the  direction  of  a 
probe.  (It  is  not  a  problem  if  an  edge  is  bi-directional,  which 
for  the  interface-level  graph  should  occur  only  as  outliers.) 
The  ISC  scheme  iteratively  selects  paths,  and  sub-paths, 
from  the  directed  interface-level  topology  of  the  previous 
round,  such  that  packets  would  probe  interfaces  that  are 
not  yet  accounted  for  by  the  paths  already  selected.  The 
initial  “bootstrap”  set  of  destinations  may  be  chosen  using 
our  subnet  centric  probing  algorithm.  We  note  that  the 
optimization  problem  of  identifying  a  minimum  subset  of 
paths  to  cover  the  interfaces  discovered  is  an  instance  of  the 
well-known  NP-complete  “Min  Set  Cover”  problem.  How¬ 
ever,  efficient  greedy  solutions  have  been  shown  to  be  In  n 
approximate  of  optimal  [9[. 

Formally,  let  P  be  a  set  of  paths  from  vantage  points  to 
destinations.  Each  p)  £  P  is  a  vector  of  router  interfaces 
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corresponding  to  the  i’th  path.  Let  the  universe  of  interfaces 
be  I  =  \JijPi\j]-  A  sub-path,  pi[n  :  m\,  is  the  n  through 
m’th  hops  of  pi,  and  includes  the  case  of  a  full  path1.  The 
size  of  sub-path  \pi[n  :  m]\  is  m  —  n  +  1,  i.e.  the  number  of 
packets  to  probe  hops  n  through  m.  The  ISC  problem  is  to 
find  the  set  of  sub-paths  from  P  with  minimum  total  size 
among  all  subsets  of  paths  covering  I.  We  thus  contrast  ISC 
with  the  full  set  cover  problem  that  finds  a  covering  set  of 
paths  Pi  from  P  of  minimum  size. 

We  observe  that  a  tension  exists  between  the  two  conflict¬ 
ing  goals  of  reducing  probing  traffic  and  capturing  dynamic 
forwarding  paths.  Many  networks  deploy  traffic  engineering 
and  load  balancing.  Thus,  regardless  of  whether  only  full 
paths  or  sub-paths  are  used,  we  expect  that  probes  will  re¬ 
veal  deviations  from  the  prior  cycle.  When  this  occurs,  we 
augment  ISC  with  a  “change  driven”  logic:  during  the  inter¬ 
face  verification  phase,  if  an  interface  other  than  expected  is 
found,  ISC  begins  a  DoubleTree-like  strategy  probing  out¬ 
ward  in  both  directions  from  the  unexpected  interface.  This 
allows  ISC  to  not  only  learn  of  load  balancing  over  multiple 
cycles,  but  also  adapt  to  underlying  topological  changes. 

3.3  Vantage  Point  Spreading 

Internet-scale  mapping  involves  probing  from  dozens  of 
different  vantage  points  (VPs).  How  to  divide  the  probing 
among  VPs  presents  another  opportunity  for  an  adaptive 
strategy  to  reduce  the  probing  traffic.  However,  as  our  pre¬ 
ceding  analysis  in  §2  shows,  additional  vantage  points  yield 
more  interface  information.  Further,  in  the  next  section  we 
find  that  the  information  gain  of  additional  VPs  to  the  same 
destination  decays  slowly.  Therefore,  due  to  large  value  in 
any  additional  vantage  points,  we  adopt  a  simple  strategy 
for  assigning  destinations  to  VPs. 

Our  vantage  point  spreading  algorithm  simply  uses  as  many 
distinct  VPs  as  possible  for  the  set  of  destinations  within  a 
given  BGP  prefix.  When  combined  with  subnet  centric  prob¬ 
ing,  as  additional  destinations  are  chosen  from  determined 
subnet  prefixes,  vantage  point  spreading  will  assign  them,  if 
possible,  to  VPs  not  yet  used  for  the  original  BGP  prefix, 
or  otherwise  distribute  them  as  uniformly  as  possible  when 
there  are  more  destinations  to  be  probed  than  VPs. 

1  In  practice  then,  a  trace  sub-path  has  the  same  origin  and 
destination  IP  addresses,  but  uses  TTL=n  to  m. 


4.  RESULTS 

Having  defined  our  three  intelligent  topology  measure¬ 
ment  primitives,  this  section  examines  their  individual  per¬ 
formance  through  a  series  of  Ark  experiments. 

4.1  Subnet  Centric  Probing 

We  evaluate  the  subnet  centric  probing  strategy  against 
a  full  cycle  of  Ark  probing,  which  for  the  sake  of  this  exer¬ 
cise  we  will  consider  to  be  the  known  ground-truth.  Note 
that  this  ground-truth  is  a  relative  measure,  rather  than  the 
actual  topology  which  remains  unknown.  We  simulate  var¬ 
ious  strategies  by  filtering  the  full  Ark  probe  data  from  a 
cycle,  i.e.  we  simulate  different  resulting  topologies  by  selec¬ 
tively  using  different  available  Ark  paths.  Our  performance 
measures  examine  the  balance  between  probing  load  and  the 
topological  structure  resulting  from  the  probes. 

To  gain  intuition  over  using  external  BGP  data  to  drive 
probe  selection,  we  first  follow  a  naive  strategy  similar  to 
[14].  We  select  a  single  destination  per  BGP  prehx  at  ran¬ 
dom  from  the  available  Ark  traces;  effectively  assuming  that 
this  single  destination  is  representative  of  the  entire  cluster 
of  destinations  that  fall  within  its  prefix.  Similarly,  we  ex¬ 
periment  with  an  even  coarser  technique  whereby  trace  des¬ 
tinations  are  clustered  according  to  their  AS  and  a  single 
destination  in  the  AS  is  deemed  representative  of  the  AS. 

We  build  the  interface-level  graph  as  inferred  by  the  raw 
Ark  data  as  well  as  using  a  single  destination  per  prefix  and 
a  single  destination  per  AS.  The  degree  distribution  of  the 
resulting  inferred  graphs  is  given  Figure  3(a).  While  both 
naive  strategies  capture  a  structure  that  appears  similar  to 
ground-truth  from  the  full  Ark  data,  there  are  large  numbers 
of  missing  interfaces  and  edges  (40-80%  of  total  as  shown  in 
Figure  3(b)).  However,  as  shown  in  Figure  3(c),  the  prefix 
clustering  method  requires  approximately  one-fifth  of  the 
full  probing  load  while  the  AS  clustering  method  results  in 
even  greater  probe  savings. 

Armed  with  this  intuition  over  the  potential  probe  sav¬ 
ings,  we  ask  whether  the  subnet  centric  probing  algorithm 
can  strike  a  better  balance  between  consumed  probing  load 
and  the  fidelity  of  the  resulting  topology.  We  observe  no 
qualitative  difference2  in  the  topology  resulting  from  the 

2We  omit  other  graph-theoretic  measures  [24,  16]  for  brevity, 
but  note  that  such  metrics  show  similarity  to  ground  truth. 
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Figure  4:  Comparing  full  trace  topology  set  covering  and  ISC  techniques  over  time. 


subnet  centric  approach  versus  the  ground-truth  in  Figure  3(a). 
The  subnet  centric  algorithm  is  able  to  capture  >  90%  of  the 
ground  truth  vertices  and  edges  while  using  less  than  60% 
of  the  ground  truth  full  probing  load. 


4.2  Interface  Set  Cover 

Next,  we  examine  the  performance  of  the  Interface  Set 
Cover  algorithm,  but  excluding  the  “change  driven”  logic 
(§3.2).  In  particular,  we  are  concerned  with  how  the  perfor¬ 
mance  of  ISC  compares  with  full  trace  set  cover,  the  degra¬ 
dation  of  performance  over  time  as  the  topology  changes, 
and  comparative  load  metrics. 

We  select  20,000  routed  IP  destinations  at  random  for 
these  experiments.  Each  day  over  a  two-week  period,  we 
probe  the  same  set  of  destinations  from  the  same  vantage 
point.  The  results  from  the  first  probing  cycle  are  used 
to  “train”  the  full  set  cover  and  ISC.  Figure  4(a)  show  the 
fraction  of  missing  interfaces  using  each  set  cover  technique 
relative  to  the  interfaces  discovered  from  the  full  set  of  traces 
in  that  cycle. 

We  see  that  after  a  single  day,  the  full  trace  set  cover 
misses  less  than  1%  of  the  interfaces  while  ISC  misses  ap¬ 
proximately  2%.  However,  while  the  full  trace  set  cover  uses 
approximately  60%  of  the  ground-truth  probing  load,  ISC 
uses  less  than  20%  -  a  huge  savings.  Note  that  for  this 
comparison,  we  omit  consideration  of  the  last  hop,  the  des¬ 
tination.  If  the  destination  were  included  and  given  that 
just  one  vantage  point  traces  to  a  given  destination  (as  is 
the  case  with  Ark),  then  the  full  trace  set  cover  yields  no 
savings. 

The  performance  of  both  set  cover  techniques  degrades 
over  time,  with  ISC  degrading  faster  to  7%  interfaces  missing 
relative  to  ground-truth  after  11  cycles.  Thus,  while  set 
cover  techniques  can  provide  a  significant  savings  in  probe 
traffic,  they  alone  do  not  suffice,  as  the  topology  changes 
over  time.  Thus,  we  augment  ISC  with  ’’change  driven”  logic. 
Our  expectation,  to  be  tested  in  future  experiments,  is  that 
the  substantial  additional  savings  in  probe  traffic  with  ISC, 
as  compared  with  full  traces,  will  dominate  the  amount  of 
additional  probing  stimulated  by  the  discovered  deviations 
(new  and  absent  interfaces)  from  the  prior  cycle. 


4.3  Vantage  Point  Influence 

To  gain  intuition  over  how  to  assign  destinations  to  van¬ 
tage  points,  we  first  perform  a  tightly  controlled  experiment 
where  2000  randomly  selected  destinations  were  each  probed 
from  38  different  vantage  points.  We  wish  to  understand 
whether  adding  additional  vantage  points  to  probe  the  same 
destination  increases  the  discovered  topology,  and  at  what 
point  the  gain  in  adding  additional  vantage  points  (VPs) 
diminishes.  Figure  5(a)  shows  the  average  number  of  dis¬ 
covered  interfaces  for  each  probed  destination  as  a  function 
of  the  number  of  vantage  points.  In  addition,  the  standard 
deviation  error  bars  shows  that  the  variance  in  discovered 
interfaces  increases  as  the  number  of  probing  vantage  points 
increases.  We  find  that  up  to  approximately  ten  vantage 
points,  the  number  of  discovered  interfaces  is  linear,  after 
which  the  influence  of  additional  vantage  points  decreases. 
Yet,  the  decrease  is  quite  slow  -  suggesting  again  that  there 
is  significant  value  in  each  additional  vantage  point.  This 
finding  contrasts  with  earlier  results  [2] ,  suggesting  that  AS- 
level  peering  and  interconnections  have  become  richer  [6]. 

Next,  we  examine  vantage  point  spreading  in  the  context 
of  two  other  strategies:  “random”  which  models  Ark’s  cur¬ 
rent  methodology  and  “single”  which  uses  a  single  VP  to 
probe  all  /24’s  within  a  prefix.  Figures  5(b)  and  5(c)  show 
the  number  of  vertices  and  edges  in  the  inferred  topology 
using  each  strategy.  As  expected,  the  “single”  strategy  per¬ 
forms  poorly.  And  while  the  “random”  assignment  strategy 
performs  well,  we  achieve  approximately  6%  gain  in  lever¬ 
aging  network  knowledge  via  our  VP  spreading  algorithm. 

A  reasonable  goal  for  our  primitives  is  substantial  savings 
in  probing  traffic  while  attaining  as  rich  or  almost  as  rich 
interface  topology.  For  the  latter,  if  we  consider  the  criterion 
of  being  within  1%  the  number  of  discovered  interfaces  as 
with  full  traces,  then  the  above  6%  gain  in  interfaces  is  well 
within  the  scope  of  concern. 

Analytically,  for  random  assignment  of  /24’s  to  VPs,  and 
for  a  prefix  with  a  mask  of  m  (k  =  224_m  /24’s  in  the  prefix), 
and  for  N  vantage  points,  where  k  <  N,  then  the  probability 
that  all  k  / 24’s  are  probed  by  a  unique  VP  is: 
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Figure  5:  Vantage  Point  spreading  algorithm  performance. 


Given  23  vantage  points,  there  is  only  a  25%  chance  that  the 
8  /24’s  in  a  / 21  would  be  assigned  to  distinct  vantage  points. 
The  chance  for  the  16  /24’s  in  a  /20  is  0.1%.  Empirically, 
our  experiments  find,  on  average,  each  of  the  16  /24’s  in 
a  / 20  prefix  is  hit  by  approximately  12  unique  VPs,  when 
performing  assignment  at  random.  In  summary,  vantage 
point  spreading  is  simple  and  imposes  no  additional  probing 
load,  and  yet  the  resulting  use  of  additional  vantage  points 
attains  worthwhile  improvement  in  the  estimated  topology. 

5.  DISCUSSION 

Reducing  the  number  of  measurements  required  to  infer 
network  topologies  has  been  explored  in  the  past,  notably 
in  DoubleTree  [8].  However,  our  primitives  are  the  first  to 
exploit  structural  knowledge  of  the  network  to  reduce  mea¬ 
surement  cost,  while  the  ISC  algorithm  is  the  logical  exten¬ 
sion  of  DoubleTree  to  the  multiple-round  tracing  scenario. 

Prior  work  [12]  examines  using  externally  generated  and 
collected  synthetic  network  coordinates  to  iteratively  select 
probe  destinations  where  the  topological  distance  is  most 
different  from  the  inferred  euclidean  distance.  While  their 
ultimate  goal  of  reducing  measurement  cost  is  the  same  as 
ours,  their  problem  formulation  entails  constructing  efficient 
overlay  topologies  among  a  known  set  of  nodes  by  inferring 
their  underlay  connectivity.  In  contrast,  we  leverage  exter¬ 
nal  network  knowledge  to  guide  the  selection  of  destinations 
for  topology  characterization  of  an  entire  AS. 

In  the  big  picture,  we  view  the  preceding  techniques  as 
important  building  blocks  for  a  new  generation  of  “Internet- 
scopes”  capable  of  performing  one  complete  round  of  prob¬ 
ing  within  a  day.  With  the  substantial  load  savings  of  these 
primitives,  our  hope  is  to  utilize  the  resulting  probing  budget 
gain  to  more  completely  characterize  the  Internet  -  captur¬ 
ing  small-scale  dynamics  and  previously  hidden  structure. 

One  challenge  in  combining  these  primitives  into  a  single 
system  design  is  that  the  ISC  technique,  by  nature,  has  its 
search  space  constrained  by  historical  views.  To  capture 
the  changes  in  Internet  topology,  the  supplemental  “change 
driven”  logic  needs  to  be  integrated  into  ISC  and  will  likely 
need  further  refinement. 

We  also  note  the  complimentary  interaction  between  subnet¬ 
centric  probing  and  vantage  point  spreading.  In  isolation, 
VP  spreading  probes  discover  the  network  ingress  points 
while  subnet  centric  probing  finds  internal  network  subnet¬ 


ting  structure.  Used  together,  however,  both  goals  can  be 
accomplished  without  exhausting  probing  budgets.  Subnet 
centric  probing  is  used  for  stub  networks  that  have  a  limited 
number  of  ingress  points  whereas  vantage  point  spreading 
is  designed  for  exploring  path  diversity  of  transit  networks 
that  have  many  peering  points  but  not  many  internal  sub¬ 
nets.  In  other  words,  we  do  not  need  to  perform  subnet 
centric  probing  per  vantage  point;  we  can  use  the  same  set 
of  probes  to  accomplish  both  objectives,  by  independently 
choosing  their  source  and  destination  addresses. 

Our  abstraction  of  the  narrow  waist  in  Figure  2,  and  its 
impact  on  topology  measurement  strategy  and  vantage  point 
selection,  is  less  relevant  for  core  networks.  A  top-tier  net¬ 
work  peers  with  the  other  top-tiers,  in  multiple  cities,  and 
provides  transit  for  its  many  downstream  networks.  Since 
these  interconnections  often  occur  at  inter-exchange  points, 
the  number  of  border  router  interfaces  of  a  top-tier  network, 
though  more  than  for  an  edge  network,  is  less  than  the  num¬ 
ber  of  its  connections  to  other  AS’s.  Thus,  discovering  the 
topology  of  a  core  network,  for  which  additional  vantage 
points  is  key,  has  less  opportunity  for  reduction  in  probing 
than  does  edge  networks.  We  intend  to  quantify  the  extent 
of  probe  reduction  possible  in  measuring  core  topologies  in 
future  work. 

Finally,  this  paper  only  targets  an  interface-level  graph. 
An  additional  alias  resolution  [3]  step,  with  more  probing,  is 
required  to  reduce  an  interface-level  graph  to  a  router-level 
graph.  We  leave  the  question  of  how  to  efficiently  perform 
alias  resolution  to  future  work. 
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